The Principal Components Method as aPre - processing Stage for Decision Tree
نویسنده
چکیده
In this paper we use the principal components method as a pre-processing stage for decision tree learners. We investigate the case when query examples are known in advance. We have evaluated an easily computable criterion for choosing the optimal number of principal components. Our experimental results show that decrease in error rate can be achieved without an increase in hypothesis complexity (the size of a decision tree). On some datasets the method competes even with C5.0 with boosting. We also show that the required time does not exceed sig-niicantly the time needed for learning a decision tree without the use of principal components. We also describe experiments with other proposi-tional learners. Modest gains have been observed for the more common case when the query examples are not known in advance. 1 Motivation A serious drawback of decision tree learners like C4.5 12] is its very restricted language. In the process of knowledge discovery in databases (KDD) it is very often the pre-processing stage, namely transformation and reduction of data, that play an important role. A good transformation/reduction technique may result in new attributes that are more appropriate for a data mining algorithm. A wide range of constructive induction techniques 6, 7, 9, 15] has been explored aiming at enriching the language of propositional learners 4, 5]. The usual way is to add new attributes that are computed from existing ones using a priori given functions. One important class of functions includes linear combinations of numerical attributes. To prevent a large increase in computational complexity and to prevent over-tting we cannot add just any linear combination. Principal components 11, 13] are useful linear functions that capture linear dependencies among attributes and hence deserve particular attention. The method presented can be seen as a multistrategy approach 5] consisting of two steps. In the rst step we compute the principal components from numerical attributes. In the second step we use these principal components to replace, or to add them to the original attributes. The rst step is actually a form of constructive induction 4] applicable for numerical variables, which uses the methodology
منابع مشابه
The Principal Components Method as a Pre-processing Stage for Decision Tree Learning
In this paper we use the principal components method as a pre-processing stage for decision tree learners. We investigate the case when query examples are known in advance. We have evaluated an easily computable criterion for choosing the optimal number of principal components. Our experimental results show that decrease in error rate can be achieved without an increase in hypothesis complexity...
متن کاملEco-Efficiency Evaluation in Two-Stage Network Structure: Case Study: Cement Companies
The cement industry, as a primary trade, plays an important role in the development of a country's organization. This industry in Iran, however, despite of profuse benefits such as high-value mines, faces many challenges. Problems such as exploitation of the production require the need for doing research into this area. The main purpose of this paper is to examine the Eco-efficiency in Iran's 2...
متن کاملLand Cover Classification Using IRS-1D Data and a Decision Tree Classifier
Land cover is one of basic data layers in geographic information system for physical planning and environmentalmonitoring. Digital image classification is generally performed to produce land cover maps from remote sensing data,particularly for large areas. In the present study the multispectral image from IRS LISS-III image along with ancillary datasuch as vegetation indices, principal componen...
متن کاملLimestone chemical components estimation using image processing and pattern recognition techniques
In this study based on image analysis, an ore grade estimation model was developed. The study was performed at a limestone mine in central Iran. The samples were collected from different parts of the mine and crushed in size from 2.58 cm down to 15 cm. The images of the samples were taken in appropriate environment and processed. A total of 76 features were extracted from the identified rock sa...
متن کاملIdentification of the most important factors of ethnic differences in anthropometric dimensions of Iranian workers using the decision tree
Background and aims: Anthropometry is the branch of human science that considers the physical measurement of the human body, especially size and shape. One application of anthropometrical data in ergonomics is the design of working space and the development of industrialized products. So that the tools, equipment and workstations, which designed based on the physical dimensions of the workers, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007